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Abstract. We present a system that constructs and maintains an up- 
to-date co-occurrence network of medical concepts based on continuously 
mining the latest biomedical literature. Users can explore this network 
visually via a concise online interface to quickly discover important and 
novel relationships between medical entities. This enables users to rapidly 
gain contextual understanding of their medical topics of interest, and 
we believe this constitutes a significant user experience improvement 
over contemporary search engines operating in the biomedical literature 
domain. 

1 Introduction 

Progress in the development of search engines and digital libraries in the last 
two decades has revolutionised access to biomedical literature. One only needs to 
recall that before the advent of PubMed and Google researchers had to mainly 
use library index cards to identify articles of interest. 

Besides giving instant access to papers based on keyword queries, search en¬ 
gine systems enable users to browse citation information, thus offering a much 
more efficient process for performing background research and literature reviews. 
Google Scholar also provides authorship statistics that allow for quick identifi¬ 
cation of influential and prolific researchers. 

Nonetheless, keyword-based searches and citation analysis have their limita¬ 
tions. Examples of medical information needs that cannot be fulfilled directly by 
the current generation of search engines include ‘‘‘'give me the list of all diseases 
and syndromes that are comorbid or are otherwise related to asthma” or ^^give 
me all pharmacological agents that have been used in the treatment of migraine”. 
Whereas the latter type of query is self-explanatory, the former is significant be¬ 
cause asthma is a complex and multifactorial disease. Thus, the co-occurring 
conditions may share common aetiologies with one or more of its factors or 
may otherwise shed light on its causes, providing researchers and practitioners 
with additional avenues for understanding the disease and exploring new treat¬ 
ment options. While there are online resources that can give partial answers 
to such questions (examples include WebMD, Medscape, The Mayo Glinic and 
Wikipedia), these are curated manually and therefore may not always be up- 
to-date. The alternative is to supply broad queries to a search engine of choice. 
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meticulously browsing and analysing the returned results to build the desired 
big picture. The results will be organised according to the search engine’s global 
ranking criteria, which, from the user’s perspective, are arbitrary with respect 
to the actual information need. This makes the browsing process unnecessarily 
time-consuming. 

The above limitations have been apparent to the community for quite some 
time. In his 2002 paper “Mining the bibliome: searching for a needle in a haystack” 
[T] , Les Grivell outlined a number of then-unmet challenges for biomedical search 
engines, amongst them: 

— Reducing the number of search results to a manageable level: users normally 
address this by trying many different Boolean search term combinations to 
find the query refinement that frames their original query in the desired 
context. 

— Handling synonymy and polysemy of biomedical search terms: to lower the 
risk of missing relevant search results, users must apply their erudition and 
issue separate queries for each of the search terms’ synonyms. On the other 
hand, term polysemy further complicates the first problem mentioned above. 

— Extracting meaningful information from articles: the freedom of natural lan¬ 
guage makes it difficult to extract relevant biomedical terms unambiguously, 
thus reducing search accuracy. 

To address these shortcomings, the author suggested visually structuring search 
results using contextual information facilitated by controlled vocabularies and 
biomedical ontologies. However, in the twelve years after Grivell’s paper was 
published there has been little indication that popular search engines are heeding 
his advice, while the amount of medical literature has since more than doubled 
- from 11 million at the time of his writing to almost 25 million articles todaj0 - 
and continues to grow at an exponential rate, making Grivell’s suggestions ever 
more pertinent. 

In this paper we present a novel search engine for medical publications - 
called “Memantic” - that aims to tackle the challenges described above. Me¬ 
mantic captures relationships between medical concepts by mining biomedical 
literature and organises these relationships visually according to a well-known 
medical ontology [5]. To give an example, a search for “Vitamin B12 deficiency” 
will yield a visual representation of all related diseases, symptoms and other 
medical entities that Memantic has discovered from the 25 million medical pub¬ 
lications and abstracts mentioned above, as well as a number of medical ency¬ 
clopaedias. Figured] shows a conceptual visualisation of this idea. 

The user can explore a relationship of interest (such as the one between “Vi¬ 
tamin B12 deficiency” and “optic neuropathy”, for instance) by clicking on it, 
which will bring up links to all the scientific texts that have been discovered to 
support that relationship (see Figure [2]). Furthermore, the user can select the 
desired type of related concepts - such as “diseases”, “symptoms”, “pharma¬ 
cological agents”, “physiological functions”, and so on - and use it as a filter 

^ As indexed by PubMed at the time of publication 
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Fig. 1. Memantic’s visualisation of diseases related to vitamin B12 deficiency. 


to make the visualisation even more concise. Finally, the related concepts can 
be semantically grouped into an expandable tree hierarchy to further reduce 
screen clutter and to let the user quickly navigate to the relevant area of interest 
(Figure El . 


Vitamin B12 deficiency and optic neuropathy 

Problems related to the diagnosis of vitamin BIZ deficiency A 
optic neuropathy. Grzybowski A 1 

Acta ophthalmologica, 2014 I 

Bilateral paediatric optic neuropathy precipitated by vitamin 
BIZ deficiency and a novel mitochondrial DMA mutation. 

Jalil A, Usmani HA, Khan MI, Blakely EL, Taylor RW, Vassallo C, 
Ashworth 1 

International ophthalmology, 2013 

Vitamin BIZ deficiency optic neuropathy detected by asymptomatic 
screening. Chu C, Scanlon P 
BM3 case reports, 2011 

Optic neuropathy in vitamin BIZ deficiency. 

Chavala SH, Kosmorsky GS, Lee MK, Lee MS 
^uropean journal of Internal medicine, 2005_^ 



Pancytopenia 


B12 deficiency 


Pernicious anemia 


(J Optic 


neuropathy 


Fig. 2. Exploring the relationship between vitamin B12 deficiency and optic neuropa¬ 
thy. 


We believe Memantic can save a considerable amount of time for researchers, 
medical students and practitioners who are investigating a particular condition, 
symptom or drug by giving a quick yet rich “mind map”-like overview of related 
medical concepts. We call our system a knowledge discovery engine because 
in contrast to traditional search systems it allows the user to quickly identify 
relationships that were previously unfamiliar to them. We achieve this in two 
ways: 
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Fig. 3. Hierarchically grouping diseases related to vitamin B12 deficiency. 


— Concisely organising related medical entities without duplication: Memantic 
first presents all medical terms related to the query concept and then groups 
publications by the presence of each such term in addition to the query itself. 
The hierarchical nature of this grouping allows the user to quickly establish 
previously unencountered relationships and to drill down into the hierarchy 
to only look at the papers concerning such relationships. Contrast this with 
the same search performed on Google, where the user normally gets a number 
of links, many of which have the same title; the user has to go through each 
link to see if it contains any novel information that is relevant to their query. 

— Keeping the index of relationships up-to-date: Memantic perpetually renews 
its index by continuously mining the biomedical literature, extracting new 
relationships and adding supporting publications to the ones already discov¬ 
ered. The key advantage of Memantic’s user interface is that novel relation¬ 
ships become apparent to the user much quicker than on standard search 
engines. For example, Google may index a new research paper that exposes 
a previously unexplored connection between a particular drug and the dis¬ 
ease that is being searched for by the user. However, Google may not assign 
that paper the sufficient weight for it to appear in the first few pages of the 
search results, thus making it invisible to the people searching for the disease 
who do not persevere in clicking past those initial pages. 

The above points are illustrated in detail in Section 3] with a number of real life 
examples. 
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The inspiration for our approach conies from the work on word co-occurrence 
networks done in the early 1980s [3]. There, the authors set out to create a map 
of relationships between scientific concepts across a number of disciplines. To do 
this, they compile a vocabulary of key scientific terms, turn them into graph ver¬ 
tices, and create edges between any two such vertices for which the corresponding 
terms are found to co-occur in scholarly texts under a set of threshold criteria 
(such as the frequency of their co-occurrence). Finally, they construct a graph of 
relationships between such terms by applying this procedure to a large database 
of relevant literature. This graph is then visualised in two dimensions and can be 
studied by researchers to explore relationships between key scientific concepts, 
both within and across a number of different scientific domains. Our interpreta¬ 
tion of this idea benefits from the use of a thoroughly curated medical ontology 
called Medical Subject Headings (MeSH) [3], where the relevant keywords for 
biomedical concepts are well established and are hierarchically categorised, and 
where synonymous terms are grouped together using unique identifiers. 

The rest of this paper is structured as follows. Section 2 gives an overview of 
related work. Section 3 describes the algorithm for constructing the co-occurrence 
network of medical concepts and our user interface design for exploring it. We 
demonstrate the usefulness of our search engine in Section 4 with a number of 
case studies and conclude this paper with an outline of potential future work in 
Section 5. 

2 Related work 

2.1 Network medicine 

Network medicine links experimental data on gene, protein, metabolic interac¬ 
tions with clinical knowledge about diseases into interaction networks. These 
networks can be studied to enhance the understanding of diseases and their 
treatments. Such networks offer a bird’s eye view of the multiple factors that can 
impact a particular disease and lend themselves to computational and mathe¬ 
matical analysis that can help identify novel disease pathways and predict patient 
drug response. A good review of these methods can be found in [4]. An illustra¬ 
tive example of this idea is the work by Goh et al. [S] , where the authors construct 
a network of human disorders and disease genes linked by known disorder-gene 
associations, which they term a “diseasome”. They then visualise this network 
as a graph, and use it to identify sets of genes responsible for multiple medical 
conditions. Interestingly, they choose to manually categorise each disorder into 
20 primary disorder classes - based on the physiological system affected by each 
disorder - instead of using labels provided by a medical ontology such as MeSH. 

Memantic differs from the above approach because it does not attempt to 
link medical concepts via shared gene or protein interactions. Instead, two con¬ 
cepts are linked only when the potential relationship between them has already 
been explicitly highlighted in scientific publications. We believe this makes the 
resulting network more useful in the clinical setting where medical specialists 
need existing research to support diagnoses and the associated treatments. 
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2.2 Clinical decision support systems 

A clinical decision support system (CDSS) is a software system that is designed 
to assist healthcare professionals in clinical decision making. Research in this field 
goes back to the early 1970s [B] and one of the earliest successful implementa¬ 
tions of this concept is the CASNET/Glaucoma system that uses the manually 
supplied expertise of glaucoma specialists to construct a causal model of the 
disease [7]. This model relates disease states (e.g. “open angle glaucoma” vs. 
“acute closure glaucoma”) with clinical observations (e.g. “reduced visual acu¬ 
ity” or “dilated pupil”) via a causal network of corresponding pathophysiological 
states {“angle closure” leading to “elevated intraocular pressure”). The thera¬ 
pist interacts with the system through a “consultation program” by answering a 
series of questions about the patient’s medical background and presentation of 
the disease. The system’s inference engine then uses the disease model to map 
the supplied answers to the potential causes of the patient’s condition and the 
recommended therapies. CASNET/Glaucoma has been able to achieve a high 
level of competence in analysing complex cases of the disease. 

Other notable systems originating from research done in the 1970s include 
MYCIN [819) . designed to diagnose and recommend treatments for a number 
of blood infections (such as bacteremia or meningitis), ONCOCIN [TU], built to 
assist physicians with the treatment of cancer patients receiving chemotherapy 
and INTERNIST I mm , used for the diagnosis of complex problems in general 
internal medicine. DXplain m, another clinical decision support system devel¬ 
oped in the early 1980s at the Laboratory of Computer Science, Massachusetts 
General Hospital, has since become an online service with both educational and 
clinical uses [M] . The system utilises a set of clinical findings to produce a ranked 
list of diagnoses which might explain or otherwise be associated with the clini¬ 
cal manifestations. DXplain contains 2,200 diseases and 5,000 symptoms in its 
knowledge base. 


Map of Medicine. Map of Medicine is a CDSS that is currently used by the 
UK National Health Service [15]. Information pertaining to a particular health 
condition is visualised as a ‘care map’, which is a flow chart made up of a 
series of steps to be taken with the patient, such as tests, treatments, referral 
to specialists, and links to other care maps. These maps are manually curated 
and kept up-to-date by a pool of medical specialists, who use the latest available 
medical evidence for this purpose. A healthcare practitioner can use these maps 
as a guide on how to tackle patients’ health problems. 


WatsonPaths. Like the Map of Medicine, IBM’s WatsonPaths visualises med¬ 
ical information using flow charts. However, instead of building on the manual 
efforts of medical curators, WatsonPaths constructs such flow charts - or “paths” 
- automatically, by scanning medical literature. These can be used for managing 
a previously identified medical condition and for establishing a set of possible 
diagnoses based on observed symptoms and laboratory test results. To quote 
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IBM’s website: “when presented with a medical case, [WatsonPaths] extracts 
statements based on the knowledge it has learned from being trained by medical 
doctors and from medical literature. Using Watson’s question-answering abili¬ 
ties, WatsonPaths can examine the scenario from many angles, working its way 
through chains of evidence - pulling from reference materials, clinical guidelines 
and medical journals in real-time - and drawing inferences to support or re¬ 
fute a set of hypotheses. This ability to map medical evidence allows medical 
professionals to consider new factors that may help them to create additional 
differential diagnosis and treatment options. [...] WatsonPaths incorporates feed¬ 
back from the physician who can drill down into the medical text to decide if 
certain chains of evidence are more important, provide additional insights and 
information, and weigh which paths of inferences the physician determines lead 
to the strongest conclusions. ” m- As WatsonPaths was not publicly available 
at the time of writing, we could not review these features in greater depth. How¬ 
ever, it is reported that the system is currently undergoing trials in a number of 
medical schools m- 

Google Health Cards. As of 2013, when a user issues a medical query, such 
as “causes of multiple sclerosis” or “symptoms of the flu”, Google shows short 
extracts from authoritative sites like WebMD and MayoClinic that give sum¬ 
marised answers to the medical question that is presumed to be contained within 
the query m- However, Google expressly disclaims responsibility for their ac¬ 
curacy. 


Memantic in the context of clinical decision support systems. Unlike 
clinical decision support systems, Memantic does not suggest diagnoses based 
on symptoms and other observations. Instead, it simply exposes relationships 
between medical concepts, leaving it up to the healthcare practitioner to choose 
how to use any novel information they discover for the medical case at hand. One 
important aspect of our system is that we do not automatically infer causality 
for any such relationship, instead encouraging the user to read the supporting 
literature to establish any possible causal aspects for themselves. Our philosophy 
is that the doctor should be in charge at all times and should not be unduly 
influenced by automated suggestions. Instead of prescribing diagnoses or specific 
courses of action, Memantic offers a quick way to broaden the practitioner’s 
horizons for a particular medical topic by highlighting relevant relationships 
that might otherwise be easily overlooked. 

2.3 Document and term clustering systems 

Grouping related terms and documents for improving search accuracy is a well- 
established approach in information retrieval. Latent Semantic Indexing (LSI) 
was the first formal framework for this purpose |19] . Within this framework, a 
linear-algebraic technique called singular value decomposition is applied to the 
matrix that represents the occurrence of terms within a collection documents. 
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and the results are used to identify groups of terms that frequently co-occur. 
The underlying assumption is that words that often appear together are likely 
to have similar meanings (or, to put it in a different way, are ^^semantically sim¬ 
ilar"). LSI offers a principled way of querying a document space such that it 
becomes possible to retrieve the documents containing terms that are semanti¬ 
cally similar to those in the query Therefore, if one searches for the term ‘car’, 
the documents that have the word ‘automobile’ will be ranked highly in the 
returned list of search results, even if those documents do not contain the term 
‘car’ itself. However, such relationships between words are captured implicitly 
by singular value decomposition and are not made known to the user. Thus, the 
latter is left unaware of why a particular set of documents is ranked highly in re¬ 
sponse to his or her query. Conversely, Memantic explicitly exposes the captured 
relationships between medical concepts by letting the user interact directly with 
the co-occurrence network. 

Vivisimo [20], a document clustering system developed in the early 2000s, 
partially addresses the implicit nature of LSI by hierarchically clustering search 
results based on the semantic similarity of the returned documents. Each cluster 
is assigned a textual label that corresponds to the most representative term out of 
those used for grouping the documents within that cluster. The cluster hierarchy 
is visualised as a one-dimensional, expandable tree, where each intermediate tree 
node represents a document cluster. The tree leaves are the actual documents 
belonging to one or more clusters. This approach is much closer to Memantic in 
its spirit than is LSI. However, since Vivisimo has to handle arbitrary queries 
that are not domain specific, it has to rely on automatic techniques for finding 
representative cluster labels, which can give rise to errors. In the case of Meman¬ 
tic, such errors are prevented by the use of the MeSH ontology, which ensures 
that only medically relevant terms are used for grouping documents together. 

3 System description 

3.1 Co-occurrence network construction 

Consider a dictionary set of medical terms {m S M} and a database of medical 
documents {d € D}. Denote d* as the title of document d, d“ as its abstract and 
df as the document’s full text body, each represented as a string of text. For 
each available document di , identify in d\ , df and d{ the longest non-overlapping 
substrings that match medical terms in M. Denote those as multisets |n in T^J, 
|r in Adi\ and |6 in respectively. Further denote Tjft as the number of 

times the term t occurs in the multiset T and T' as the set of all terms in T. 
Defines the co-occurrence operator between two multisets of extracted medical 
terms as 

PoQ = {{p, q, zp^q{P#p, Q#q)) :pGP',q€ Q'} 

The result of this operator is a set of tuples corresponding to all possible ordered 
pairs of terms, whose first component is a term that occurs at least once in P, 
the second is a term that similarly occurs in Q and the third is the real-valued 
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output of the weight function zpg that takes into account the number of times 
the first term appears in P and the second in Q. Compute the co-occurrence 
sets for Tdi o Td^, Td, o Ad^, Td, o Ad, o Ad,, Ad, o Fd, and Fd, o Fd,. Define 
a mapping function 

f : M |M|,a:SN} 

such that every term in the medical dictionary maps to a unique integer x 
that can range between 1 and the size of the dictionary. Further define the co¬ 
occurrence matrix as the real-valued, \M\ x \M\ matrix C. Given a co-occurrence 
set P o Q, for every tuple {pj,qj,Zj) in the set, increment C{f(pj),f{qj)) by 
wp_QZj, where wp^q is the weight coefficient for P o Q. Apply this step for 
every co-occurrence set computed above. Repeat the entire procedure for every 
document d G D, such that C becomes the matrix of relationships between the 
medical terms in M, where C(/(mfe),/(m;)) represents the ‘relatedness’ score 
between the terms mi and m^. 

In our original implementation of Memantic 

Vp G P',qG Q', zp^qiPifp, Q#q) = 1 

and the weight coefficients w were determined heuristically such that each co¬ 
occurrence set type (e.g. ‘title-abstract’) would have a corresponding constant 
weight value across all documents. 

In simple terms, C defines our co-occurrence network and indicates how of¬ 
ten pairs of medical terms occur together in medical documents. The weighting 
function z and the weighting coefficients w are used to rank the relative impor¬ 
tance of two terms occurring together in the same title of a document versus e.g. 
one occurring in the document’s title and the other in the abstract of the same 
document, before computing the final relatedness score. 

3.2 Document database and medical dictionary 

To construct the above co-occurrence network, we used the entire catalogue of 
medical publications available via PubMed [H] (approximately 25 million at 
the time of writing). We analysed the articles’ full text where the former were 
accessible via PubMed’s OpenAccess policy and only their metadata otherwise 
(title and abstract). Additionally, we indexed web pages about medical condi¬ 
tions from the Medscape online medical encyclopaedia [22]. We used the National 
Institute of Health’s MeSH ontology as the source of entries in our medical dictio¬ 
nary, utilising both the main descriptors and the supplementary concept records 
available therein. We utilised MeSH unique identifiers as our medical terms to 
ensure that synonymous entries in the dictionary map to the same underlying 
medical concepts in the co-occurrence network. 

3.3 Visualisation and user interface 

Querying. The user starts with a query containing a single medical concept. 
This can be any term that is contained in the MeSH controlled vocabulary, 
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Fig. 4. Query interface, with a list of spelling-based suggestions for the ambiguous 
query ^Alzheimer'. 


and is not restricted to diseases or symptoms. For example, a drug name or a 
geographical region can also be supplied. If the query is not present in MeSH, 
Memantic will offer a list of suggestions that are similarly spelled, ranked by 
the Levenshtein distance between the query term and the suggested vocabulary 
entries (Figure [4]). 


Search results. Search results are visualised as a two-dimensional tree, which 
is centred on the tree root. The latter represents the query concept and the tree 
leaves are the concepts connected to the query in the co-occurrence networlH. 
The tree is rendered using the dS.js Collapsible Force Layout [23], which employs 
an optimisation algorithm to evenly position the tree nodes on the screen. The 
nodes can be dragged around and repositioned by the user to maximise the legi¬ 
bility of the node labels. There are two modes of visualising the tree: hierarchical 
and flat'. 

— In the first mode, intermediate tree nodes (coloured blue) are used to hier¬ 
archically group leaf nodes according to their conceptual categories (Figure 
0. The set of hierarchical categories of a leaf node is based on the MeSH 
hierarchy descriptor of the node’s concept. An intermediate node can be col¬ 
lapsed by clicking on it, which will hide all the descending nodes from view. 
When the search results are returned, all intermediate nodes of depth one 

^ Memantic displays a concept only when it occurs together with the query term in 
at least two different research articles or in at least one medical encyclopaedia entry 
that directly concerns either that concept or the query term itself. This is done to 
filter out spurious co-occurrences that are not scientifically signihcant. 
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Alzheimer Disease Send feedback Help Log out 

I^Show all options (^Use hierarchies (^Display node weights Centre screen 
^Limbic Encephalitis LInktype: Diseases ▼ 

^^Oenetic Diseases, Inborn 

^Otorhinolaryngologic Diseases 
^^Other ^Hemic artd Lymphatic Diseases 

^^Respiratory Tract Diseases 


^ptrirua Diseases 

9 Malaria 

(3^ •yf Onset 
^Olgeatlve System Diseases 

O Alzheimer Disease 

^female Urogenital Diseases and Pregnancy Complicatiorvs 
^^^Mntal Disorders 


■ utrltiortal artd Metabolic Diseases 




^Musculoskeletal Diseases 

^Bacterial Infections and Idycoses 
^Sleep Deprivation 

^Urinary Incontinence 


^Scrapie 


•Behavior and Behavior Mechanisms 


0 Mouth Diseases 

'athological Cortditions, Signs and Symptoms 
^Kye Diseases 


9 Occupational Diseases 
^Skin ai>d ConMctive Tissue Diseases 

^pmmune System Diseases 

^^^^vdiovascular Diseases 
^Kndocrine System Diseases 


Fig. 5. Hierarchically organised search results for the query in Figure 2] 
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and greater are collapsed by default. This is done to reduce screen clutter 
and can be especially useful when the query concept has many connections. 
The user can then expand the nodes that correspond to the categories of in¬ 
terest, and do so progressively until the desired set of connections is revealed 
(Figure [5]). If an intermediate node has one child node, it is automatically 
excised from the tree in the manner shown in Figure [51 This procedure is 
applied repeatedly until there are no such nodes left in the tree. This is done 
to reduce the effort of navigating through the tree structure. 

— In the second mode, all the leaves are directly connected to the root node 
(Figure [T]). It is possible to toggle between the two modes by clicking the 
“use hierarchies” checkbox in the floating toolbox located in the top right 
hand corner of the screen. 



Fig. 6. Removing an intermediate node with a single child from the tree structure prior 
to visualisation. 


Leaf nodes. Clicking on the text of a leaf node will unveil a panel on the left 
side of the screen, listing links to scientific articles that contain the association 
between the leaf’s concept and that of the root node (Figure |9]). The articles are 
ordered by publication date, starting with the most recently published paper. 
The top of the publication panel contains a horizontally stacked bar chart that 
gives a visual breakdown of the number of relevant publications by each decade. 
Clicking on a decade bar will scroll the publication list to the last publication 
from that decade. 

If the association has also been found in one of the medical encyclopaedias 
indexed by Memantic, a link to the corresponding encyclopaedia article will be 
displayed in the publication panel above the list of scientific articles (Figure [TU]). 
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Ocular Toxoplasmosis Send feedback Help Logout 

I^Showalloptions QUse hierarchies (^Display node weights Centre screen 
Link type: Diseases * 



Fig. 7. Flat result visualisation for the query “Ocular toxoplasmosis”. This is a conve¬ 
nient way to display concept connections when there aren’t too many of them. 
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^^^^Brdiovascular Diseases 
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^Exfoliation Syndrome 
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^Cataract 


^Retlrtal Degeneration 


• oUucoru SOpUo N.™ 


^Macular DegerveratJon 


Fig. 8. User expanding the tree presented in Figure [5] Expanding a collapsed interme¬ 
diate node changes its colour from dark to light blue. 
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Alzheimer Disease and Macular Degeneration (10) 


Associations between age-related macular 
degeneration, Alzheimer disease, and dementia: record 
linkage study of hospital admissions. 

Keenan TO, Goldacre R, Goldacre MJ 
JAMA ophthalmology, 2014 

A search for age-related macular degeneration risk 
variants in Alzheimer disease genes and pathways. 
Logue MW, Sehu M. Vardarajan BN, Farrell J, Lunetta KL, Jun G. 
Baldwin CT, Deangelis MM. Farrar LA 
Neurobiology of aging, 2014 

Alzheimer's disease and age-related macular 
degeneration have different genetic models for 
complement gene variation. 

Proitsi P. Luplon MK. Dudbridge F. Tsolaki M, Hamiiton G. Daniilidou 
M. Pritchard M, Lord K, Marlin BM. Johnson J. Craig D. Todd S, 
McGuinness B, HoMingworth P, Harold D. KloszewsKa I, Soininen H, 
Mecocci P. Velas B, Gill M, Lawlor B. Rubinsztein DC. Brayne C. 
Passmore PA, Williams J. Lovestone S, Powell JF 
Neuroblology of aging, 2012 

Common micro RNAs (miRNAs) target complement 
factor H (CFH) regulation In Alzheimer's disease (AO) 
and in age-related macular degeneration (AMD). 

Lukiw WJ, Surjyadipta B, Qua P, Alexandrov PN 

International journal of biochemistry and molecular biology, 2012 

Age-related macular degeneration (AMO): Alzheimer's 
disease In the eye? 

Kaarniranta K, Saiminen A, Haapasalo A, Soininen H, Hiltunen M 
Journal of/Uzheimer's disease : JAO, 2011 

Parallel findings in age-related macular degeneration 
and Alzheimer's disease. 

Ohno-Matsui K 

Progress In retinal and eye research, 2011 
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Ltnktypa: DIaaaaaa - 


katholoolcat Conditlom, Slant and Symptomt 




OxaUnal OeotnaraUon 


Fig. 9. User clicking on a concept associated with the query in Figure [S] In this case, 
the user is exploring the connection between Alzheimer disease and macular degener¬ 
ation. The number in brackets is the total count of all relevant articles returned in the 
publication list. 


Clicking on the circle part of the leaf node will issue a new search query with 
the text of the node’s label, which, in effect, “re-centres” the user interface on 
the clicked concept. This allows the user to explore the co-occurrence network 
by jumping from one concept to the next. 


Link type. Memantic offers the ability to filter the visualised concepts by their 
type, such as “diseases”, “pharmacological agents”, “therapeutic procedures”, 
and so on. For example. Figures O [7] and [S] display only diseases that are con¬ 
nected to the user’s queries, which is the default setting. The link type can be 
selected in the drop down box in the floating toolbox. By way of example. Fig¬ 
ure m shows the set of pharmacological agents discovered by Memantic to be 
associated with rickets. Memantic uses the “semantic type” field of MeSH dic¬ 
tionary entries to enable filtering concepts in the above manner; any semantic 
type present in MeSH can be used for filtering purposes. 


Leaf node colours. Orange nodes signify concept associations that were discov¬ 
ered only in the scientific articles indexed from PubMed. Green nodes represent 
associations that were found only in medical encyclopaedias, and yellow nodes 
are used for associations that have been identified in both data sources. 
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Fig. 10. User clicking on a concept associated with the query in Figure[5] Now the user 
is exploring the association between Alzheimer disease and progressive supranuclear 
palsy. Note the medical encyclopaedia links that precede the list of relevant scientific 
publications. 
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Fig. 11. Pharmacological agents associated with rickets. 
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Node size. The size of a leaf node (or its “weight”) is a function of the number 
of publications supporting the association of its concept with the query. The 
size of an intermediate node represents the sum of the sizes of all leaf nodes 
that descend from that node. This is a quick way to indicate the amount of 
available research for a particular association. It is possible to switch off the 
node weighting by toggling the “node weights” checkbox (Figure [12). 


Memantic Academic Trial (Beta) 
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Fig. 12. Query results from Figure [TT] with node weighting turned off. 


Query box. Subsequent queries can be entered in the toolbox in the top right 
corner of the screen. The toolbox also has the facility to offer query suggestions 
and to submit user feedback (Figure |T31)- 


3.4 Demonstration videos 

In addition to the figures we present in this paper, we have created two demo 
videos. The first video, located at http; //youtu.be/201yfgNKzz<|l follows a 
user issuing a query for Alzheimer disease, as shown in Figures 0] and [SI navi¬ 
gating to the desired disease connections through the medical concept hierarchy 

® Also available at http: //vimeo . com/122447677 and http: //archive . org/details/ 
MemanticAlzheimer 
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Fig. 13. Floating toolbox displaying suggested corrections for a new query with a 
spelling mistake. 


(Figure [5]), and exploring one connection that is only present in the PubMed 
database (Figure H]) and one that is also present in the Medscape medical ency¬ 
clopaedia (Figure ITUl). 

The second video, uploaded to http://youtu.be/fH7BMVBaLZc3, illustrates 
the same user issuing a query for rickets, changing the link type to “pharmacolog¬ 
ical agents”, switching to flat visualisation mode (Figure [TT]), and then toggling 
the “node weights” option. 

4 Case studies 

4.1 Broken heart syndrome, a.k.a. Takotsubo cardiomyopathy 

Takotsubo cardiomyopathy is a temporary condition where one’s heart mus¬ 
cle becomes suddenly weakened. Also called acute stress cardiomyopathy, bro¬ 
ken heart syndrome and apical ballooning syndrome, the condition causes the 
heart’s left ventricle to change shape. According to the British Heart Founda¬ 
tion, the main symptoms are chest pains and breathlessness, similar to a heart 
attack. However, the key difference from the latter condition is the absence of 
any blockages in the coronary arteries, as confirmed by the angiogram imag¬ 
ing test. Although the cause of this condition has not been confirmed, it has 
been reported that approximately 75% of people diagnosed with Takotsubo car¬ 
diomyopathy have experienced significant emotional or physical stress prior to 
the onset of the symptoms. There is some evidence that the excessive release 
of hormones (particularly adrenaline) during such periods of stress causes the 
weakening of the heart muscle 

At the time of writing, the pages about Takotsubo cardiomyopathy on the 
websites of the Mayo Clinic Wikipedia [50] and Medscape m echoed the 
above description and cited psychological and physical stress as the main triggers 
for the syndrome. Memantic reveals a somewhat more detailed picture (Figure 
[Tij). It is immediately apparent that the largest group of associated diseases 
and syndromes are of cardiovascular nature, which is quite expected (Figure 

Also available at http: //vimeo . com/122451794 and http: //archive . org/details/ 
MemanticRickets 
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[m. However, another large cluster represents nervous system diseases, of which 
there is little or no mention in the above online resources ("Figure 117)1) . By way 
of example, exploring the connection between Takotsubo cardiomyopathy and 
epilepsy shows that most papers investigating the possibility of the latter be¬ 
ing a trigger have been published in the current decade ("Figure ITT)) . Figure ITCl 
shows that most concept nodes in this subcategory are orange, which means 
that the associated conditions were not listed in the Medscape medical ency¬ 
clopaedia when Memantic had last indexed it. After reviewing the articles in the 
nervous system disease category one begins to get the idea that brain disorders 
may also be involved in the hormonal imbalances that can lead to this type of 
cardiomyopathy. 
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llectrocardiography 
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Cardipvascular Diseases 


O' 


nmune System Diseases 


rvous System Diseases 


Hematologic Diseases 


^Acute Kidr>ey Injury 


Sepsis 


^or>g QT Syr>drome 


Fig. 14. Diseases related to Takotsubo cardiomyopathy. 


4.2 Early diagnosis of Alzheimer disease by retinal examination 

Alzheimer disease is a neurodegenerative disorder characterised by a decline in 
cognitive function. It mainly affects the elderly and while there are currently no 
known treatments that are effective, accurate early diagnosis may prove critical 
for possible future intervention approaches. Figure [TS] shows Memantic visual¬ 
ising diagnostic procedures related to Alzheimer disease. In particular, it shows 
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Fig. 15. Cardiovascular diseases related to Takotsubo cardiomyopathy. 
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Takotsubo Cardiomyopathy and Epilepsy (9) 
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Fig. 17. Exploring the connection between Takotsubo cardiomyopathy and epilepsy. 


the recent research on the use of Optical Coherence Tomography (a retinal imag¬ 
ing technique) for identifying the onset of the condition. Note that the orange 
colour of the associated node indicates that this information was not present in 
the Medscape medical encyclopaedia when Memantic had last indexed it. 
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Detection of retinal nerve fiber layer degeneration in 
patients with Alzheimer's disease using optical 
coherence tomography: searching new biomarkers. 
Bambo MP, Garcia-Martin E, PInilla J, Herrero R. Satue M. Otin S, 
Fuertes I. Marques ML, Pablo LE 
Acta ophthalmologlca, 2014 

Reliability and validity of Cirrus and Spectralis optical 
coherence tomography for detecting retinal atrophy in 
Alzheimer's disease. 

Polo V. Garcia-Martin E. Bambo MP. Pinllla J, Larrota JM, Satue M, 

Otin S, Pablo LE 

Eye (London, England), 2014 

Choroidal Thinning as a New Finding in Alzheimer's 
Disease; Evidence from Enhanced Depth Imaging 
Spectral Domain Optical Coherence Tomography. 
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Potential new diagnostic tool for Alzheimer's disease 
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Larrosa JM, Garcia-Martin E, Bambo MP, PInilla J, Polo V. Otin S, 

Satue M, Herrero R, Pablo LE 

Investigative ophthalmology & visual science, 2014 

Evaluation of retinal nerve fiber layer and ganglion cell 
layer thickness in Alzheimer's disease using spectral- 
domain optical coherence tomography. 

Marziani E, Pomati S. Ramolfe P, Cigada M. Giani A, Mariani C, 
Staurenghi G 

Investigative ophthalmology & visual science, 2013 



Fig. 18. The connection between Alzheimer disease and optical coherence tomography. 
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4.3 Exploring the association between measles and multiple 
sclerosis 

Multiple sclerosis (MS) is a disease in which the immune system attacks the 
protective sheath (myelin) that covers nerve fibres. Damage to myelin disrupts 
the electrical signals that are passed along the nerves between the brain and the 
rest of the body. This disruption can result in a variety of neurological symptoms, 
such as the loss of motor skills and vision. Little is currently known about what 
triggers the immune system to attack myelin, but at one point it was thought 
that exposure to the measles virus could play a role. Figure [TO] shows Memantic’s 
visualisation of viruses related to MS and the list of publications concerning the 
relationship between MS and the measles virus. The stacked bar chart that 
breaks down publication numbers by decade indicates that this hypothesis was 
investigated most intensively in the 1970s and 80s and is no longer actively 
pursued. 



Fig. 19. The connection between multiple sclerosis and the measles virus. 


4.4 The link between asthma and vitamin D deficiency 

Asthma is a disease of the respiratory system and manifests itself through recur¬ 
rent episodes of airway constriction. It is considered a complex and multifactorial 
disease, since a number of different genes are thought to play a role in the disor¬ 
der and a variety of environmental factors can act as triggers. Figure ITOl shows 
Memantic’s visualisation of diseases related to asthma and the list of publica¬ 
tions concerning the relationship of the disease to vitamin D deficiency. The 
decade bar chart indicates that this research is relatively recent and the orange 
colour of the associated node implies that this relationship is not mentioned in 
the Medscape medical encyclopaedia’s article on asthma. 
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Fig. 20. The connection between asthma and vitamin D deficiency. 


4.5 Mental health conditions associated with the use of cannabis 

An increasingly active debate about legalising cannabis has been taking place in 
recent decades in much of the western world. The potential mental health risks 
associated with the psychoactive substances present in the drug are a major 
part of this debate. Figure [201 shows Memantic’s visualisation of the mental 
and behavioural dysfunctions related to cannabis and the list of publications 
concerning the relationship of the drug to psychosis. It is evident that most 
of the research concerning this relationship has been carried out in the last two 
decades. Schizophrenia, depression and bipolar disorder also feature prominently 
amongst other related mental health conditions. 


Cannabis and Psychotic Disorders (263) x 




Fig. 21. The connection between cannabis and psychosis. 
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5 Future work 

A significant drawback of our system is that queries can only contain one medical 
concept at a time. In the future, we would like to extend our approach to more 
complex queries, so that it would be possible to explore which concepts are 
related to a collection of medical terms, or to an arbitrary string of text that is 
not present in our medical dictionary. We would also like to offer users the ability 
to manually correct concept relationships that were extracted erroneously. 
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